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METHOD AND SYSTEM FOR INTER-CHANNEL SIGNAL REDUNDANCY 
REMOVAL IN PERCEPTUAL AUDIO CODING 

Cross References to Related Applications 
5 The instant application is related to a previously filed patent application, Serial No. 

09/612,207, assigned to the assignee of the instant application, and filed July 7, 2000, which 
is incorporated herein by reference. 

Field of the Invention 

io The present invention relates generally to audio coding and, in particular, to the 

coding technique used in a multiple-channel, surround sound system. 

Background of the Invention 

As it is well known in the art, the International Organization for Standardization (IOS) 

15 founded the Moving Pictures Expert Group (MPEG) with the intention to develop and 
standardize compression algorithms for video and audio signals. Among several existing 
multicannel audio compression alogrithms, MPEG-2 Advanced Audio Coding (AAC) is 
currently the most powerful one in the MPEG family, which supports up to 48 audio channels 
and perceptually lossless audio at 64 kbits/s per channel. One of the driving forces to develop 

2 o the AAC algorithm has been the quest for an efficient coding method for surround sound 

signals, such as 5 -channel signals including left (L), right (R), center (C), left-surround (LS) 
and right-surround (RS) signals, as shown in Figure 1. Additionally, an optional low- 
frequency enhancement (LFE) channel is also used. 

Generally, an TV-channel surround sound system, running with a bit rate of Mbps/ch, 

2 5 does not necessarily have a total bit rate ofMxN bps, but rather the overall bit rate drops 

significantly below MxN bps due to cross channel (inter-channel) redundancy. To exploit the 
inter-channel redundancy, two methods have been used in MPEG-2 AAC standards: Mid- 
Side (MS) Stereo Coding and Intensity Stereo Coding/Coupling. Coupling is adopted based 
on psychoacoustic evidence that at high frequencies (above approximately 2 kHz), the human 

3 o auditory system localizes sound based primarily on the "envelopes" of critical-band-filtered 
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versions of the signals reaching the ears, rather than the signals themselves. MS stereo 
coding encodes the sum and the difference of the signal in two symmetric channels instead of 
the original signals in left and the right channels. 

Both the MS Stereo and Intensity Stereo coding methods operate on Channel-Pairs 
5 Elements (CPEs), as shown in Figure 1 . As shown in Figure 1 , the signals in channel pairs 
are denoted by (100 L , 100r) and (IOOls, IOOrs). The rationale behind the application of stereo 
audio coding is based on the fact that the human auditory system, as well as a stereo recording 
system, uses two audio signal detectors. While a human being has two ears, a stereo 
recording system has two microphones. With these two audio signal detectors, the human 
10 auditory system or the stereo recording system receives and records an audio signal from the 
same source twice, once through each audio signal detector. The two sets of recorded data of 
the audio signal from the same source contain time and signal level differences caused mainly 
by the positions of the detectors in relation to the source. 

It is believed that the human auditory system itself is able to detect and discard the 
15 inter-channel redundancy, thereby avoiding extra processing. At low frequencies, the human 
auditory system locates sound sources mainly based on the inter-aural time difference (ITD) 
of the arrived signals. At high frequencies, the difference in signal strength or intensity level 
at both ears, or inter-aural level difference (ILD), is the major cue. In order to remove the 
redundancy in the received signals in a stereo sound system, the psychoacoustic model 
2 o analyzes the received signals with consecutive time blocks and determines for each block the 
spectral components of the received audio signal in the frequency domain in order to remove 
certain spectral components, thereby mimicking the masking properties of the human auditory 
system. Like any perceptual audio coder, the MPEG audio coder does not attempt to retain 
the input signal exactly after encoding and decoding, rather its goal is to reduce the amount of 

2 5 audio data yet maintaining the output signals similar to what the human auditory system 

might perceive. Thus, the MS Stereo coding technique applies a matrix to the signals of the 
(L, R) or (LS, RS) pair in order to compute the sum and difference of the two original 
signals, dealing mainly with the spectral image at the mid- frequency range. Intensity Stereo 
coding replaces the left and the right signals by a single representative signal plus directional 

3 o information. 
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While conventional audio coding techniques can reduce a significant amount of 
channel redundancy in channel pairs (L/R or LS/RS) based on the dual channel correlation, 
they may not be efficient in coding audio signals when a large number of channels are used in 
a surround sound system. 

It is advantageous and desirable to provide a more efficient encoding system and 
method in order to further reduce the redundancy in the stereo sound signals. In particular, 
the method can be advantageously applied to a surround sound system having a large number 
of sound channels (6 or more, for example). Such system and method can also be used in 
audio streaming over Internet Protocol (IP) for personal computer (PC) users, mobile IP and 
third-generation (3G) systems for mobile laptop users, digital radio, digital television, and 
digital archives of movie sound tracks and the like. 

Summary of the Invention 

The primary object of the present invention is to improve the efficiency in encoding 
audio signals in a sound system in order to reduce the amount of audio data for transmission 
or storage. 

Accordingly, the first aspect of the present invention is a method of coding audio 
signals in a sound system having a plurality of sound channels for providing M sets of audio 
signals from input signals, wherein Mis a positive integer greater than 2, and wherein a 
plurality of intra-channel signal redundancy removal devices are used to reduce the audio 
signals for providing first signals indicative of the reduced audio signals. The method 
comprises the steps of: 

converting the first signals to data streams of integers for providing second signals 
indicative of the data streams; and 

reducing inter-channel signal redundancy in the second signals for providing third 
signals indicative of the reduced second signals. 

Preferably, when the coding efficiency in the second signals is representable by a first 
value and the coding efficiency in the third signals is representable by a second value, the 
method further comprises the step of comparing the first value with second value for 
determining whether the reducing step is carried out. 
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Preferably, the audio signals from which the intra-channel signal redundancy is 
removed are provided in a form of pulsed code modulation samples. 

Preferably, the intra-channel signal redundancy removal is carried out by a modified 
discrete cosine transform operation. 

Preferably, the inter-channel signal redundancy reduction is carried out in an integer- 
to-integer discrete cosine transform operation. 

Preferably, the inter-channel signal redundancy reduction is carried out in order to 
reduce redundancy in the audio signals in L channels, wherein L is a positive integer greater 
than 2 but smaller than M+l . 

Preferably, the method further includes a signal masking process according to a 
psychoacoustic model simulating a human auditory system for providing a masking threshold 
in the converting step. 

Preferably, the method further includes the step of converting the reduced second 
signals into a bitstream for transmitting or storage. 

According to the second aspect of the present invention, a system for coding audio 
signals in a sound system having a plurality of sound channels for providing M sets of audio 
signals from input signals, wherein Mis a positive integer greater than 2, and wherein a 
plurality of intra-channel signal redundancy removal devices are used to reduce the audio 
signals for providing first signals indicative of the reduced audio signals. The system 
comprises: 

means, responsive to the first signals, for converting the first signals to data streams of 
integers for providing second signals indicative of data streams; and 

means, responsive to the second signals, for reducing inter-channel signal redundancy 
in the second signals for providing third signals indicative of the reduced second signals. 

Preferably, when the coding efficiency in the second signals is representable by a first 
value and the coding efficiency in the third signals is representable by a second value, the 
system further comprises means for comparing the first value with the second value for 
determining whether the second signals or the third signals are used to form a bitstream for 
transmission or storage. 

Preferably, the audio signals from which the intra-channel signal redundancy is 
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removed are provided in a form of pulsed code modulation samples. 

Preferably, the intra-channel signal redundancy removal is carried out by a modified 
discrete cosine transform operation. 

Preferably, the inter-channel signal redundancy reduction is carried out in an integer- 
5 to-integer discrete cosine transform operation. 

Preferably, the inter-channel signal redundancy reduction is carried out in order to 
reduce redundancy in the audio signals in L channels, wherein £ is a positive integer greater 
than 2 but smaller than M+l . 

Preferably, the system further includes means for providing a masking threshold 

1 o according to a psychoacoustic model simulating a human auditory system, wherein the 

masking threshold is used for masking the first signals in the converting thereof into the data 
streams. 

The present invention will become apparent upon reading the description taken in 
conjunction with Figures 3 to 5. 

15 

Brief Description of the Drawings 

Figure 1 is a diagrammatic representation illustrating a conventional audio coding 
method for a surround sound system. 

Figure 2 is a diagrammatic representation illustrating an audio coding method for 

2 o inter-channel signal redundancy reduction, wherein a discrete cosine transform operation is 

carried out prior to signal quantization. 

Figure 3 is a diagrammatic representation illustrating an audio coding method for 
inter-channel signal redundancy reduction, according to the present invention. 

Figure 4a is a diagrammatic representation illustrating the audio coding method, 

2 5 according to the present invention, using an M channel integer-to-integer discrete cosine 

transform in an M channel sound system. 

Figure 4b is a diagrammatic representation illustrating the audio coding method, 
according to the present invention, using an L channel integer-to-integer discrete cosine 
transform in an M channel sound system, where L<M. 

3 o Figure 4c is a diagrammatic representation illustrating the MDCT coefficients are 
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divided into a plurality of scale factor bands. 

Figure 4d is a diagrammatic representation illustrating the audio coding method, 
according to the present invention, using two groups of integer-to-integer discrete cosine 
transform modules in an M channel sound channel system. 

Figure 5 is a block diagram illustrating a system for audio coding, according to the 
present invention. 

Detailed Description 

The present invention improves the coding efficiency in audio coding for a sound 
system having M sound channels for sound reproduction, wherein M is greater than 2. In the 
method of the present invention, the individual or intra-channel masking thresholds for each 
of the sound channels are calculated in a fashion similar to a basic Advanced Audio Coding 
(AAC) encoder. This method is herein referred to as the intra-channel signal redundancy 
method. Basically, input signals are first converted into pulsed code modulation (PCM) 
samples and these samples are processed by a plurality of modified discrete cosine transform 
(MDCT) devices. According to a previously filed patent application, Serial No. 09/612,207, 
the MDCT coefficients from the multiple channels are further processed by a plurality of 
discrete cosine transform (DCT) devices in a cascaded manner to reduce inter-channel signal 
redundancy. The reduced signals are quantized according to the masking threshold calculated 
using a psychoacoustic model and converted into a bitstream for transmission or storage, as 
shown in Figure 2. While this method can reduce the inter-channel signal redundancy, 
mathematically it is a challenge to relate the threshold requirements for each of the original 
channels in the MDCT domain to the inter-channel transformed domain (MDCT x DCT). 

The present invention takes a different approach. Instead of carrying out the discrete 
cosine transform to reduce inter-channel signal redundancy directly from the modified 
discrete cosine transform coefficients, the modified discrete cosine transform coefficients are 
quantized according to the masking threshold calculated using the psychoacoustic model prior 
to the removal of cross-channel redundancy. As such, the discrete cosine transform for cross- 
channel redundancy removal can be represented by an MxM orthogonal matrix, which can be 
factorized into a series of Givens rotations. 
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Unlike the conventional coding method, the present invention relies on the integer-to- 
integer discrete cosine transform (INT-DCT) of the modified discrete cosine transform 
(MDCT) coefficients, after the MDCT coefficients are quantized into integers. As shown in 
Figure 3, the audio coding system 10 comprises a modified discrete cosine transform 
5 (MDCT) unit 30 to reduce intra-channel signal redundancy in the input pulsed code 

modulation (PCM) samples 100. The output of the MDCT unit 30 are modified discrete 
cosine transform (MDCT) coefficients 110. These coefficients, representing a 2-D spectral 
image of the audio signal, are quantized by a quantization unit 40 into quantized MDCT 
coefficients 120. In addition, a masking mechanism 50, based on a so-called psychoacoustic 

1 o model, is used to remove the audio data believed not be used by a human auditory system. As 

shown in Figure 3, the masking mechanism 50 is operatively connected to the quantization 
unit 40 for masking out the audio data according to the intra-channel MDCT manner. The 
masked 2-D spectral image is quantized according to the masking threshold calculated using 
the psychoacoustic model. In order to reduce the cross-channel redundancy, an INT-DCT 
15 unit 60 is used to perform INT-DCT inter-channel decorrelation. . The processed MDCT 
coefficients are collectively denoted by reference numeral 130. The processed coefficients 
130 are then Huffman coded and written into a bitstream 140 for transmission or storage. 
Preferably, the coding system 10 also comprises a comparison device 80 to determine 
whether to bypass the INT-DCT unit 60 based on the cross-channel redundancy removal 

2 o efficiency of the INT-DCT 60 at certain frequency bands (see Figure 4c and Figure 5). As 

shown in Figure 3, the coding efficiency in the signals 120 and that in the signals 130 are 
denoted by reference numerals 122 and 126, respectively. If the coding efficiency 126 is not 
greater than the coding efficiency 122 at certain frequency bands, the comparison device 80 
send a signal 124 to effect the bypass of the INT-DCT unit 60 regarding those frequency 
25 bands. 

It should be noted that in an M channel sound system, according to the present 
invention, the inter-channel signal redundancy in the quantized MDCT coefficients can be 
reduced by one or more INT-DCT units. As shown in Figure 4a, a group of M-tap INT-DCT 
modules 60i,..., 60n-i, 60n are used to process the quantized MDCT coefficients 120i, 120 2 , 
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120 3 ,.., 120m-i, and 120m. After the inter-channel signal redundancy is reduced, the 
coefficients representing the sound signals are denoted by reference numerals 130i, 1302, 
1303,.., 130m-i, and 130m- It is also possible to use a group of L-tap INT-DCT modules 
6O1',..., 60n-i\ 60n' to reduce the inter-channel signal redundancy in L channels, where 
2<L<M, as shown in Figure 4b. For example, in a 5-channel sound system consisting of left 
(L), right (R), center (C), left-surround (LS) and right-surround (RS) channels, it is possible 
to perform the integer-to-integer DCT of the quantized MDCT coefficients involving only 4 
channels, namely L, R, LS and RS. Likewise, in a 12-channel sound system, it is possible to 
perform the inter-channel decorrelation in 5 or 6 channels. 

Figure 5 shows the audio coding system 10 of present invention in more detail. As 
shown in Figure 5, each of M MDCT devices 30i, 30 2 ,..., 30m, respectively, are used to obtain 
the MDCT coefficients from a block of 2N pulsed code modulation (PCM) samples for one of 
the M audio channels (not shown). Thus, the total number of PCM samples for M channels is 
Mx2N . This block of PCM samples is collectively denoted by reference numeral 100. It is 
understood that the Mx2N PCM pulsed may have been pre-processed by a group of M Shifted 
Discrete Fourier Transform (SDFT) devices (not shown) prior to being conveyed to the 
MDCT devices 30i, 30 2 ,.-, 30m . 30m to perform the intra-channel decorrelation. When a 
block of 2N samples (2N being the transform length) are used to compute a series of MDCT 
coefficients, the maximum number of INT-DCT devices in each stage is equal to the number 
of MDCT coefficients for each channel. The transform length 2N is determined by transform 
gain, computational complexity and the pre-echo problem. With a transform length of 2N, 
the number of the MDCT coefficients for each channel is N. Typically, the MDCT transform 
length 2iVis between 256 and 2048, resulting in 128 (short window) to 1024 (long window) 
MDCT coefficients. Accordingly, the number of INT-DCT devices required to remove cross- 
channel redundancy at each stage is between 128 and 1024. In practice, however, the number 
of INT-DCT units can be much smaller. As shown in Figure 5, only P INT-DCT units 6O1, 
6O2,. .., 60 p 

(p<N) to remove cross channel signal redundancy after the MCDT coefficient are quantized 
by quantization units 40i, 40 2 ,..., 40 M into quantized MDCT coefficients. The MDCT 
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coefficients are denoted by reference numerals HO/i, 110/2, 110/3,-, 110/(jv-i), and WQjn , 
where j denotes the channel number. The quantized MDCT coefficients are denoted by 
reference numerals 120,1, 120,-2, 120,3,.., 120,^-1), and 120 JN . After INT-DCT processing, the 
audio signals are collectively denoted by reference numeral 130, Huffman coded and written 
5 to a bitstream 140 by a Bitstream formatter 70. 

It should be noted that, each MDCT device transforms the audio signals in the time 
domain into the audio signals in the frequency domain. The audio signals in certain 
frequency bands may not produce noticeable sound in the human auditory system. According 
to the coding principle of MPEG-2 Advanced Audio Coding (AAC), the N MDCT 

10 coefficients for each channel are divided into a plurality of scale factor bands (SFB), modeled 
after the human auditory system. The scale factor bandwidth increases with frequency 
roughly according to one third octave bandwidth. As shown in Figure 4c, the N MDCT 
coefficients for each channel are divided into SFB1, SFB2,..., SFBK for further processing by 
TV INT-DCT units. With iV=128 (short window), K=14. With #=1024 (long window), K=49. 

15 The total bits needed to represent the MDCT coefficients within each SFB for all channels 
are calculated before and after the INT-DCT cross-channel redundancy removal. Let the 
number of total bits for all channels before and after INT-DCT processing be BR1 and BR2 
as conveyed by signal 122 and signal 126, respectively. The comparison device 80, 
responsive to signals 122 and 126, compares BR1 and BR2 for each SFB. If BR1>BR2 for 

20 an SFB, then the INT-DCT unit for that SFB is used to reduce the cross channel redundancy. 
Otherwise, the INT-DCT unit for that SFB can be bypassed, or the cross-channel redundancy- 
removal process for that SFB is not carried out. In order to bypass the INT-DCT unit, the 
comparison device 80 sends a signal 124 for effecting the bypass in the encoder. It should be 
noted that, it is necessary for the encoder to inform the decoder whether or not INT-DCT is 

2 5 used for a SFB, so that the decoder knows whether an inverse INT-DCT is needed or not. 

The information sent to the decoder is known as side information. The side information for 
each SFB is only one bit, added to the bitstream 140 for transmission or storage. 

Because of the energy compaction properties of the MCDT, the MDCT coefficients in 
high frequencies are mostly zeros. In order to save computation and side information, the P 

3 o INT-DCT units may be used to low and middle frequencies only. 
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Each of the INT-DCT devices is used to perform an integer-to-integer discrete cosine 
transform represented by an orthogonal transform matrix A. Let x be an Mxl input vector 
representing M quantized MDCT coefficients 110i/t, 1102*, 110 3 fo.., 110^, then Ax is an Mxl 
output vector representing M INT-DCT coefficients 120u, 120 2 *, 120 3 /t,.., 120m*. The integer- 
to-integer transform is created by first factorizing the transform matrix A into a plurality of 
matrices that have l's on the diagonal and non-zero off-diagonal elements only in one row or 
column. It has been found that the factorization is not unique. Thus, it is possible to use 
elementary matrices to reduce the transform matrix A into a unit matrix, if possible, and then 
use the inverse of the elementary matrixes as the factorization. Because the transform matrix 
A is orthogonal, it is possible to factorize the transform matrix A into Givens matrices and 
then further factorize each of the Givens matrices into three matrices that can be used as 
building blocks of the integer-to-integer transform. For simplicity, a sound system having 
M=3 channels is used to demonstrate the INT-DCT cross-channel decorrelation, according to 
the present invention. 

A matrix that has l's on the diagonal and nonzero off-diagonal elements only in one 
row or column can be used as a building block when constructing an integer-to-integer 
transform. This is called 'the lifting scheme'. Such a matrix has an inverse also when the end 
result is rounded in order to map integers to integers. 

Let us consider the case of a 3 x 3 matrix (a,b&R, x,eZ) 



ri 
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0" 


~x, 




X, 
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b 






oxj + x 2 + bx 


L° 


0 
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_x 3 _ 




x 3 



(1) 



= x 2 + \ax l + bx 3 | A 
where | [ denotes rounding for the nearest integer. The inverse of (1) is 
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1 0 0 

-a 1 -b 
0 0 1 



x 2 +\ax } +bx 3 \ A 
x 3 

x \ 

x 2 + 1- ax 1 + \ax l + frx 3 | A - bx 2 



- ax x + x 2 + jox, + bx 3 \ A - bx 3 



A Givens rotation is a matrix of the form: 



G{i,k,0)-- 



0 ; 0 • 0 

c \ s \ 0 
-s \ c \0 

0 ; 0 ; 1 

1 k 



where c = cos(&) , 5 = sinC^) 

A Givens matrix is clearly orthogonal and the inverse is 



1 ; 0 : 

0 ; c \ 
0 s 



(2) 



(3) 



(4) 



Any m x m orthogonal matrix can be factorized into m(m-l)/2 Givens rotations and m 
sign parameters. 

As an example, let A be an orthogonal matrix. 

Firstly, 6 X can be chosen such that tan (6>, ) = -~ . It follows that 
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G(2,3,0 t )~ l ■ A 
"10 0 

0 cos(6») -sinfi, 

0 sin(6>) cos(60 



0 



-B 



(5) 



jf a 3 3 = 0 , then 6, = tv/2 i.e. cos(0,) = 0 , sin(#,) = 1 is chosen. This matrix still has 
an inverse, even when used to create an integer-to-integer transform. 
Secondly, 6 2 is chosen such that ten(p 2 ) = -£ L , 



G(l,3,0 2 Y-B 

cos (e 2 ) o -sw(e 2 )~ 

= 01 , b 22 0 

sin(0 2 ) 0 cos(# 2 ) & 3] b 32 & 33 ^ 

C .,2 0 
= K b 2,2 0 

c 31 c 32 c 3; 

Now, since both G(2,3,^) _1 , G(l,3,0 2 ) _1 and also A are orthogonal, therefore, C has to 
be orthogonal, and every row and column in C has unit norm. Thus,c 3J = ±l and 
c 31 ,c 32 =0 



\ 2 0 
0 ±1 



(7) 



Lastly, 6> 3 is chosen such that tan(<9 3 ) = 



12 



Since G(l,2, 0 3 )~ l and C are orthogonal, D must be orthogonal. 



D = 



±10 0 
0+10 
0 0 ±1 
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G(\,2Ay C 

~cos(# 3 ) -sin(# 3 ) 0 

= sin(# 3 ) cos(6> 3 ) 0 

0 0 1 

X. 0 0 
d 2l d 22 0 
0 0 ±1 



K b 2,2 

0 0 



= D 



(8) 



Finally: 

G(i,2,# 3 ) _1 • G(l,3,0 2 Y ■ G(2,3Ay -A = D 



(9) 



Taking D as the sign matrix: 

D ■ G{i,2AV ■ G(l,3,d 2 )~ l ■ G{2,3AY '^ = 1 
Therefore, A can be factorized as: 

A = G(2,3, 9 X ) • G(l,3, 0 2 ) ■ G(l,2, 0 3 ) ■ D 



(10) 



(ID 



For m x m matrices, the operation is similar. Givens rotations can in turn be 
factorized as follows: 
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G{i,k,0)-- 



1 ; 0 
0 ; c 
0 ! -s 
0 ; 0 
0 

(i- c )A 
i 

0 



1 ! 0 

o ; l 
0 ; -s 
o : o 



o 

(l-c)/s 

l 

0 



(12) 



when 0 is not an integral multiple of 2n . If it is, then the Givens rotation matrix equals the 
unity matrix and no factorization is necessary. These factors are denoted as G{i,k,0\, 
G{i,k,e\ and G{iX0\. A transform that behaves similarly to matrix A, maps integers to 
integers and is reversible is then 

G(2,3,eX \g(2,3A) 2 •ptafl.X • ••■ 

I 1 ' (13) 

\G{l,2A\ ■ \G{l2A\ ■ p{l,2Al ■ D ■ x\ a \ a \ a ■■ |JJ 
where x is the integer 3 x 1 input vector. 

In order to remove cross-channel redundancy in L channels, an LxL orthogonal 
transform matrix .4 is factorized into I(L-l)/2 Givens rotations. Givens rotations are further 
factorized into 3 matrices each, resulting in the total of 3L(L-l)/2 matrix multiplications. 
However, because of the internal structure of these matrices, only 3£(Z,-l)/2 multiplications 
and 3L(L-l)/2 rounding operations are needed in total for each INT-DCT operation. 

The efficiency of the cascaded INT-DCT coding process in removing cross-channel 
redundancy, in general, increases with the number of sound channels involved. For example, 
if a sound system consists of 6 or more surround sound speakers, then the reduction in cross- 
channel redundancy using the INT-DCT processing is usually significant. However, if the 
number of channels to be used in the INT- DCT processing is 2, then the efficiency may not 
be improved at all. It should be noted that, like any perceptual audio coder, the goal of 
cascaded INT-DCT processing is to reduce the audio data for transmission or storage. While 
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the processing method is intended to produce signal outputs similar to what a human auditory 
system might perceive, its goal is not to replicate the input signals. 

It should be noted that the so-called psychoacoustic model may consist of a certain 
perceptual model and a certain band mapping model. The surround sound encoding system 
may consist of components such as an AAC gain control and a certain long-term prediction 
model. However, these components are well known in the art and they can be modified, 
replaced or omitted. 

Furthermore, in an M-channel sound system, according to the present invention, the 
inter-channel signal redundancy in the quantized MDCT coefficients can be reduced by a 
number of groups of INT-DCT units. As shown in Figure 4d, there is no or little correlation 
between channels 1 to M' and channels M'+l to M-l, and it would be more meaningful to 
perform INT-DCT for each group of channels separately. As shown, a group Liof Af-tap 
INT-DCT modules 60"i,..., 60" N -i, 60" N and a group L 2 of (M-M'-l)-tap INT-DCT modules 
60i',..., 60n-i% 60n' are used to process the quantized MDCT coefficients 120i, 1202, 1203,.., 
120m-i, and 120m in (M-l) channels. For example, in a cinema having 8 front sound channels 
and 10 rear sound channels where there is no or little correlation between the front and rear 
channels, it is desirable to process the sound signals in the front channels and the rear 
channels separately. In this situation, it is possible to use a group of 8-tap INT-DCT modules 
to reduce the cross-channel signal redundancy in the 8 front channels and a group of 10-tap 
INT-DCT modules to process the 10 rear channels. In general, it is possible to use one, two 
or more groups of INT-DCT modules to reduce the cross-channel signal redundancy in an M- 
channel sound system. 

Thus, although the invention has been described with respect to a preferred 
embodiment thereof, it will be understood by those skilled in the art that the foregoing and 
various other changes, omissions and deviations in the form and detail thereof may be made 
without departing from the spirit and scope of this invention. 
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